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[57] ABSTRACT 

A method and apparatus for recognition of handwritten input 
is disclosed where handwritten input composed of a 
sequence of (x, y, pen) points, is preprocessed into a 
sequence of strokes. A short list of candidate characters that 
are likely matches for the handwritten input is determined by 
finding a fast matching distance between the input sequence 
of strokes and a sequence of strokes representing each 
candidate character of a large character set where the 
sequence of strokes for each candidate character is derived 
from statistical analysis of empirical data. A the final sorted 
list of candidate characters which are likely matches for the 
handwritten input is determined by finding a detailed match- 
ing distance between the input sequence of strokes and the 
sequence of strokes for each candidate character of the short 
list A final selectable list of candidate characters is pre- 
sented to a user. 

36 Claims, 8 Drawing Sheets 
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METHOD AND APPARATUS FOR 
CHARACTER RECOGNITION OF 
HANDWRITTEN INPUT 

This is continuation of application Ser. No. 08/456,502, 5 
filed Jun. 5, 1995 and now abandoned 

FIELD OF THE INVENTION 

This invention relates generally to handwriting 
recognition, and more particularly to recognition of large 10 
characters sets where each character includes one or more 
strokes. 

BACKGROUND OF THE INVENTION 

Machine recognition of human handwriting is a very x$ 
difficult problem, and with the recent explosion of pen-based 
computing and electronic devices, has become an important 
problem to be addressed There exists various different 
computing and electronic devices that accept handwritten 
input. So called pen-based products, for example, 20 
computers, and personal digital assistants, and the like 
typically have a touch sensitive screen upon which a user 
can impose handwriting. These devices then function to 
digitize the handwritten input Other devices, such as 
computers, advanced telephones, digital televisions, and 25 
other information processing devices, can access a digitizing 
tablet which can accept handwritten input Still other 
devices can receive handwritten character input by means of 
a fax, scanned input, electronic mail, or other electronic 
transmission of data. These devices process the information 30 
and attempt to recognize the information content of the 
handwritten input; Typically, the device then displays that 
information to the user for purposes of feedback, correction 
of errors in the processing, and for recognition of the 
handwritten character input 35 

There exists various approaches for recognition of hand- 
written input when the recognition is for characters sets 
having a limited finite number of characters, typically under 
a hundred. However often such approaches do not work as 
well for character sets having large numbers of varied 40 
complex characters. Examples of large character sets that 
have been difficult to quickly and accurately recognize 
through recognition of handwritten input are several of the 
asian ideograghic character/symbol languages, such as 
Chinese, simplified and traditional, Japanese, and other 45 
languages having large character sets, Some languages such 
as simplified Chinese consist of several thousand characters. 

Traditional methods, such as keyboard entry, of inputing 
data and text supplied in one of these types of large character 
based languages is often very difficult; inpart because of the 50 
large number and complexity of the character set. 
Additionally, many of these such languages resort to pho- 
netic based representations using Western characters in 
order to enter the characters with a keyboard. Hence, 
keyboard-type entry of such characters is difficult An 55 
example of the difficulty of keyboard entry for a large 
character set based language is keyboard entry of the Chi- 
nese language. To enter data, or text, in Chinese, via a 
keyboard, the language is first Romanized. Western 
Characters, such as the English, anglo-saxon alphabet are 60 
used to phonetically represent the characters of the Chinese 
language. This is referred to as Pin-yin. Therefore, for a 
person wishing to enter data or text in Chinese through a 
keyboard, the person must first know Pin-yin, and the 
corresponding English character representation for the phoe- 65 
ntic equivalent of the Chinese character they are trying to 
enter via the keyboard. 



2 

Another, difficulty encountered with recognition of hand- 
written input of data, or text, based upon a langague having 
a large character set is diversity among various persons is 
great because of the large amount of characters and the 
complexity of the characters themselves. Additionally, many 
of these such languages have one or more forms of repre- 
senting the same character, similar to print and cursive forms 
for the English, anglo-saxon alphabet Additionally, such 
languages may have homophones for example, the Chinese 
language has numerous homophones — words that are pro- 
nounced the same but have different meaning and written 
forms. Hence, the same Pin-yin can refer to a multiplicity of 
characters and the person entering Chinese character data 
must often select from a list of possible choices. 

Typically, techniques used for handwriting recognition of 
the english, anglo-saxon alphabet character set, or other such 
limited finite character sets of under hundred, do not produce 
accurate results for languages having large character sets, of 
several hundred or several thousand varied complex char- 
acters. Many of the techniques used for handwriting recog- 
nition of small character set languages are very slow when 
used for large character set languages. 

Therefore, because of the increasing use of pen-based 
electronic input devices, the difficulty of keyboard entry for 
large, complex, character set languages, a need exists for a 
method and apparatus for recognition of handwritten input 
for complex, large character set langauages that is quick, 
accurate, and easy to use. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 Illustrates a block diagram of operation of a 
preferred embodiment of the presention invention. 

FIG. la Illustrates a front, left perspective view of an 
illustrative pen-based microprocessor entry device suitable 
to receive input in accordance with the present invention. 

FIG. 2 Illustrates a block diagram detailing operation of 
a preferred embodiment of the present invention. 

FIG. 3 Illustrates a format of a preferred embodiment of 
reference templates in accordance with the present inven- 
tion. 

FIG. 4, Illustrates a block diagram of operation of a 
preferred embodiment of character matching in accordance 
with the present invention, 

FIG. 5 Illustrates a block diagram of operation of a 
preferred embodiment of fast matching in accordance the 
present invention. 

FIG. 6 Illustrates a flow diagram of operation detailing a 
preferred embodiment of fast matching in accordance with 
the present invention. 

FIG. 7 Illustrates graphically a preferred embodiment of 
fast matching in accordance with the present invention. 

FIG. 8 Illustrates graphically a preferred embodiment of 
fast matching in accordance with the present invention. 

FIG. 9, Illustrates a block diagram for a preferred embodi- 
ment of detailed matching in accordance with the present 
invention. 

FIG. 10. Illustrates a flow diagram of a preferred embodi- 
ment of detailed matching in accordance with the present 
invention. 

FIG. U. Illustrates a flow diagram of a preferred embodi- 
ment of detailed matching in accordance with the present 
invention. 

FIG. 12. Illustrates a front, left perspective view of an 
illustrative pen-based electronic entry device having a 
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microprocessor upon which handwritten input has been tronically are understood in the art. In a preferred method, 

received and a corresponding detailed matching has been handwritten input is accepted by a device, such as a personal 

displayed in accordance with a preferred embodiment of the digital assistant (PDA) or other device. Other devices that 

present invention. function to receive handwritten input include, but are not 

3 limited to, the following: computers, modems, pagers, 

DETAILED DESCRIPTION OF PREFERRED telephones, digital televisions, interactive televisions, 

EMBODIMENT devices having a digitizing tablet, facsimile devices, scan- 
ning devices, and other devices with the ability to capture 

Generally, the present invention relates to a method and handwritten input 
apparatus for recognition of handwritten input; and prefer- ^ me mvennoil , the handwritten input (ink) that 
ably the present invention relates to a method and apparatus is presented to the recognizer corresponds to that of a single 
for recognition of handwritten input representing one or character. If two or more characters need to be recognized, 
more characters selected from a language or compilation of ^ mc ^ corresponding to each character must be sup- 
data having a large complex set of characters where each plicd to ^ ^co^zcr separately in time and preferably in 
character includes one or more strokes. w ^ c desired sequential order in order to determine the iden- 

Pursuant to a preferred embodiment of the present tity of and preferred order of each of the characters, 

invention, candidate characters in support of a handwriting accordance with the present invention and illustrated in 

recognition method and apparatus of the present invention FIGS. 1. and la, and 12, generally the recognizer (192) 

are developed through the compilation and statistical analy- performs a series of operations on the input ink (104) and 

sis of empirical data compiled from hundreds of samples of 20 produces a list of candidate characters (106) that correspond 

actual handwritten characters. Candidate characters pro- t o and represent the handwritten input (20). Preferably a list 

duced through the development of templates dervived from 0 f candadate characters (106) is provided from which selec- 

the compilation and statistical analysis of the empirical data jj on ^ 0 f me candidate character that most likely 

are selectable as the recognized character of the handwritten corresponds to and represents the handwritten input. The list 

input. 2 5 may be variable in the number of candidate characters 

Referring now to the Figures, FIGS. 1, la and 12 illustrate presented to choose from. The candidate character that most 
general operation of a method and apparatus in accordance represents and corresponds to the handwritten input can then 
with a preferred embodiment of the present invention. With be selected. The selection can occur through various 
reference to FIG. la, and 12 an example of a pen-based methods, including but not limited to such methods as user 
electronic entry device is illustrated. A personal digital 30 selection, or language modeling, or the tike. In accordance 
assistant is illustrated as generally depicted by the reference with a preferred embodiment of the present invention the 
numeral 10. The personal digital assistant (10) depicted recognizer of the present invention is adapted and config- 
constitutes a generic representation, typically such devices ured to recognize individual characters which are a part, or 
include a housing (12) and a touch screen (18) upon which subset, of large character sets where preferably each char- 
input can be handwritten using an appropriate hand manipu- 35 acter includes one or more strokes and the character set 
lated stylus (15). Such devices typically include one or more comprises hundreds or even thousands of individual 
microprocessors or other digital processing devices. As . characters, and more preferably whose individual characters 
such, these devices comprise computational platforms that have a preponderance of straight line pieces. Examples of 
can be readily adapted in accordance with the teachings such large character sets include but are not limited to the 
presented herein. It should be understood that, while such 40 ideographic character symbols of several of the asian 
personal digital assistants comprise a ready platform to languages, including but not limited to Chinese, Japanese, 
accommodate the practice of the applicant's teachings, the etc. In accordance with a preferred method and embodiment 
teachings presented herein may be practiced in a variety of of the present invention, recognition of handwritten input in 
other operating environments as welt Some examples of accordance with the present invention is accomplished with 
such environments include, but are not limited to the 45 respect to the character set of the simplified Chinese 
following, computers or other electronic entry devices with language, in particular, of the characters defining the cat- 
digitizing screens, or connected to a digitizing input surface, egory of GB1 simplified Chinese characters, 
or capable of receiving faxed, scanned, or other electronic Referring now to FIG. 2, a block diagram detailing 
input, digital or interactive televisions, modems, telephones. operation of a preferred method and apparatus is illustrated, 
pagers, or other systems with the ability to capture hand- 50 As shown in FIG. 2, a preferred embodiment of the present 
written input and process it invention includes access to a preprocessing module (122), 

Referring now to FIG. 1 a block diagram of a preferred a character matching module (140), and a set of reference 

embodiment of recognizing handwritten input in accordance templates (160). Preferably, the preprocessing module con- 

with the present invention is illustrated. Handwritten input verts the handwritten input(20), or raw input data, i.e. 
in accordance with a preferred embodiment of the present 55 sequence of (x.y,pen) (104) values into a sequence of 

invention is represented as a sequence of (x,y } pen) values "strokes,'* In accordance with the present invention a 

where x and y represent the (x,y) coordinates of an ink point "stroke" is defined as a basic unit of pen movement. Any 

with respect to some coordinate system and pen is a binary handwritten input can then be represented as a sequence of 

variable that represents the pen state with respect to the input strokes. A preferred representation for a "stroke" is a straight 
surface of a device. A pen value can either be a pen-up (pen 60 line parametrized is by using a four dimensional vector 

is not in contact with the writing or input surface) or a having the following dimensions: 1) tax, 2) my, 3) len, 4) 

pen-down (pen is in contact with the writing or input ang. Where mx is the x coordinate of the mid point of the 

surface). In accordance with the present invention, hand- stroke, my is the y coordinate of the mid point of the stroke, 

written input may be captured electronically using a digi- len is the length of the straight line stroke, and ang is the 
tizing tablet, or alternatively may be derived from a scanned 65 angle made by the straight line stroke with respect to some 

or faxed image through a process of line detection in the reference axis (like the x axis). Alternatively, other defini- 

image. Such methods of capturing handwritten input elec- tions of "stroke" and other parametrizations of straight line 
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"stroke" may used in accordance with the preprocessing staring the reference templates in the present invention is 
modale (122) of the present invention. The preprocessing illustrated in FIG. 3; where the number of characters in the 
module (122) reduces the amount of data that needs to vocabulary is denoted by M. For each character, (example, 
processed by the recognizer (102) and it also serves to character 1 marked 302), the number of templates (304) is 
correlate multiple instances of the same handwritten char- 5 stared. For each template (306), the number of "strokes" 
acter that look similar, and provides the recognizer (102) (308) is then stored. For each "stroke" (310), a parametrized 
with a preferred quality of input. An embodiment of the description of the stroke and the weight associated with the 
preprocessing module is described in related U.S. patent stroke is stored (312). In 312 f the preferred parametrization, 
application entitled METHOD AND MICROPROCESSOR i.e, the four dimensional vector [mx, my, len, ang] is shown. 
FOR PREPROCESSING HANDWRITING HAVING 10 However, other parametrizations may also be used. Alter- 
CHARACTERS COMPOSED OF A PREPONDERANCE native parameterlzations of "stroke" may used in accordance 
OF STRAIGHT LINE SEGEMENTS filed concurrently, with both the preprocessing module (122) of the present 
and on the same day as the present application, having U.S. invention and the reference templates (160) of the present 
Ser. No. 08/463,366. invention. However, in a most preferred embodiment of the 

In the preferred method and embodiment of the present 15 present invention, the parameterization of the "stroke" are 
invention, the recognizer includes the character matching the same for the preprocessing module (122) and for the 
module (140). Generally, the character niatching module reference templates (160). 

(140) correlates and compares the handwritten input of the Referring now to FIGS. 1, 2, 4, and 5, FIG. 4 illustrates 
present invention to one or more sets of stored reference a block diagram of operation of a preferred embodiment of 
templates (160) and then provides a corresponding list of 20 character matching is shown. In the preferred embodiment 
preferred candidate characters that have the most probablity illustrated, the character matching module (140) as shown in 
of representing the original handwritten input(20). FIG. 2 includes two distinct components. The components, 

Generally, the reference templates (160) of the present a fast matching module (600), and a detailed niatching 
invention include one or more templates or sets of templates module (1000) of the character matching module (140) that 
for each character in the large character set. For a preferred 2 5 resides within recognizer (102) are shown in FIG. 2 and 4. 
embodiment of the present invention the recognizer (102) Preferably, the input to the character matching module is the 
was implented for Simplified Chinese Characters (the char- sequence of straight line strokes (125) that is produced by 
acter set used in Mainland Oiina) and has a vocabulary size the selected preprocessing module (122). The "strokes" 
of 3755 characters. This character set is commonly referred (125) represent the preprocessed handwritten input (20). The 
to as GB1, where simplified Chinese Characters consist of 30 first stage, or component, of the character matching module 
character sets GB1 and GB2. In the preferred embodiment (140) is the fast matching module (600). Generally, the fast 
of the present invention several templates are referred to, or matching module (600) component functions to quickly 
accessed, by the recognizer (102) for each character in the provide a short list (625) of candidate characters that most 
preferred GB1 Character set vocabulary. The multiple tern- likely includes a corresponding and representative match to 
plates for each character are provided to balance diversity 35 the handwritten input (20) as illustrated in FIG. 5. The 
among writer to writer variations of the complex characters; second stage or component of the character matching mod- 
and balance diversity of forms, i.e. print vs cursive, of ule (140) is the detailed matching module (1000). Generally, 
representing the same character even among the same writer. the detailed matching module (1000) functions to provide a 
Pursuant to a preferred embodiment of the present invention detailed matching of the handwritten input (20) with only 
the reference templates (160) of candidate characters are 40 those reference templates (160) of candidate characters 
developed from empirical data The empirical data is com- provided on the short list (625) produced by the fast match- 
piled and statistical analysis is performed on hundreds of ing module (600)0 Preferably, the short list (625) produced 
samples of actual handwritten input representing each of the from the fast matching module (600) ensures that the 
potential characters to be recognized. These candidate char- detailed matching module can quickly and accurately pro- 
acters are then provided in accordance with the present 45 vide a corresponding representative candidate character to 
invention and are selectable, as the corresponding and rep- the handwritten input More preferably, the combination of 
resentative recognized character of the handwritten input the fast matching module (600) and the detailed matching 

A preferred format of the reference templates (160) is module (1000) provide a method and apparatus for recog- 
similar to that of the preprocessed ink that is produced by the nition of handwritten input that can be done in real time ( L e. 
preprocessing module (122). Accordingly, each "stroke" is 50 the amount of time it takes to write, or input, a character), 
parametrized in some fashion. In accordance with the Referring to FIG. 5, a block diagram of the fast matching 
present invention a "stroke" is simply a straight line param- module (600) is shown. Generally, the input to the fast 
eterized is some fashion. As discussed previously a preferred matching module is the above discussed preprocessed hand- 
way of parameterizing a "stroke" is by using a four dimen- written input that is described by a sequence of straight line 
sional vector having the following dimensions; 1) mx, 2) 55 strokes (125). The output of the fast matching module is a 
my, 3) len, 4) ang. Where rax is the x coordinate of the mid short list of candidate characters (625) that most probably 
point of the stroke, my is the y coordinate of the mid point corresponds and represents, or matches, the handwritten 
of the stroke, len is the length of the "stroke^and ang is the input (20). 

angle made by the "stroke" with respect to some reference Referring now to FIG. 6, a flow chart detailing the 
axis (like the x axis). Referring to the preferred reference 60 operation of the fast matching module is shown. In FIG. 6, 
templates accessed by the present invention, each "stroke"in the index i refers to the i th character of the preferred 
addition to being parametrized in some fashion, has a weight character set, (602) and the index (620) j refers to the j th 
associated with it which indicates how important the stroke template of the character. The symbol Ti is the number of 
is to describe the character; the weight associated with each templates for the character whose index is i, and the quantity 
stroke is determined from analysis of the empirical data. 65 ms is the minimum string matching distance between the 
The reference templates are obtained by gathering empiri- input and all the templates of one character. The minimum 
cal data and performing statistical analysis. The format for matching distance is initialized to a large number at the start 
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of each new character (604). Fast matching starts by element of string SL To find the next best matching pair of 
attempting to match the input with the first template of (he elements from the two strings, the three immediate neigh- 
first character in the vocabulary (ie. the character whose boring pairs (m+l»n) t (m+l,n+l) t and(m,n+l) are compared 
index is 1). The absolute difference d between the number of (see 706). The pair that has the least distance is picked as the 
straight line strokes in the input and the number of straight 5 next matching pair and the distance between the two ele- 
line strokes in the jth template of character i is then com- ments in the new pair is added to the current fast string 
puted (606). The difference d is compared against a thresh- matching distance. In FIG. 8, the table 802 shows the first 
old (608). The threshold can be fixed one or can be variable elements of the two strings forming a matching pair with a 
that depends on the number of strokes in the input and the score of 5. To find the next matching pair of elements in the 
template. A preferred threshold to use is computed as 10 two strings, the three pairs and (5(m+l), 10(n)); (5(m+l), 
thresh=(number of strokes in the input + number of strokes 25(n+l)); and (15(m), 25(n+l)) illustrated in FIGS. 1, la, 
in the templateyiO +1. If the difference d is less than the and 12 are considered. Of the three pairs, the pair (5,10) has 
threshold, a fast string matching distance s is computed the least distance of 5. Hence the current pair is moved up 
between the input and the jth template of character i (610). and the fast string matching score is updated (5+5=10) (see 
The details of the obtaining of the fast matching distance 15 table 804 in FIG. 8). The processes of finding the best 
will be given in the following few paragraphs. The minimum matching pairs in the two strings is repeated until the last 
string matching distance ms is updated (612) based on the element of string SI is paired with the last element of string 
newly computed fast string matching distance s (610). The S2. These steps are illustrated in tables 806, 808. 810, and 
steps 606, 608, 610, and 612 are repeated until all the Ti 812. The accumulated fast string matching distance when 
templates for character i are exhausted. Note that if the 20 the last elements of the two strings are paired is the final fast 
difference d is greater than the threshold, steps 610 and 612 string matching distance. In 812, the final string matching 
are omitted and the next template for the character is distance between the two strings SI and S2 is 70. Table 814 
considered- Once the minim um string matching distance in FIG. 8 shows the sequence of best matching pairs that was 
between the input and the templates for character i has been used to compute the fast string matching distance between 
computed, the current shortlist of characters that best match 23 the two strings SI and S2. 

the input is updated (618). In addition to the shortlist of FIG. 9 shows the computational blocks of the detailed 

candidate characters that best match the input, the immmum matching module. The inputs to the detailed matching 

fast string matching distances for matching the input with module are the preprocessed handwritten input (sequence of 

the templates for each character in the shortlist are also strokes) and the shortlist of candidate characters that is 

stored. The shortlist is sorted using the minimum fast string 30 produced by the fast matching module. The output of the fast 

matching distances. Given a new character index and a detailed matching module (1000) is the final sorted list of 

corresponding minimum fast string matching score, the candidate characters that best match the input The detailed 

current shortlist is updated by inserting the new character matching module comprises of two major computational 

index into the shortlist at the location dictated by the new blocks. The first block (902) finds a detailed matching 

minimum fast string matching distance. If the new minimum 35 distance between the input and the templates for all the 

fast string matching distance is greater than the last entry in characters included in the shortlist of candidate characters 

the current list of imninuim fast string matching distances, produced by the fast matching module. The output of 902 is 

then the current shortlist does not need to be updated (622). a first detailed match list of candidate characters. Hie second 

After updating the current short list of candidate characters detailed ixiatching block (904) re-sorts the first detailed 

that best match the input, the next character in the 40 match list of candidate characters to produces the final 

vocabulary, if one such exists, is considered by starting again sorted list of candidate characters that best match the hand- 

at step 604. After all the characters in the vocabulary have written input (106). FIG. 10 is a flow chart describing the 

been considered, the short list of candidate characters that dynamic programming based matching module (902). In 

best match the input is forwarded to the detailed matching HG. 10, the index i refers to the ith entry in the shortlist of 

stage of the character matching module. In FIG. 6, the 45 candidate characters produced by the fast matching module 

symbol M is used to denote the vocabulary size (i.e. (1002). The index of the character stored as the i th entry in 

M-number of characters in the vocabulary). the fast match short list is denoted by fi. The index j refers 

In a preferred method and embodiment of the present to the j th template of character fi and T(fi) is the number of 

invention, the fast string matching in 610 is based on a single templates stored for character fi. The symbol F is used to 

stroke feature. However, more than one stroke feature may 50 denote the size of the fast match short list The quantity ms 

be used for fast string matching. A preferred stroke feature is the minimum dynamic programming matching distance 

to use for fast string matching is the angle of the stroke. between the input and all the templates of one character. The 

Hence, the fast string matching distance is the distance minimum dynamic programming matching distance is ini- 

between two one dimensional strings. A one dimensional tialized to a large number at the start of each new character 

string is simply a string of one dimensional quantities (like 55 in the fast match shortlist (1004). Detailed matching starts 

a single stroke feature). Note that the lengths of the two by attempting to match the input with the first template of 

strings need not be the same. The technique used to compute the first character in the fast match short list A matching 

the fast string matching distance is illustrated using a simple distance s is computed between the input and the jth tem- 

example, FIGS. 7 and 8 show how the fast string matching plate of character fi (see 1006) using me technique of 

distance between the strings Sl=[10. 25, 15, 90. 120] and 60 dynamic programming. The technique of dynamic program- 

S2=[15, 5, 100. 140] is computed. The length of string SI is roing is known to one of ordinary skill in the art and can be 

5 and the length of string S2 is 4. To begin with, the first found in the paper by Sakoe and Chiba, (H. Sakoe and S. 

element of string SI (702) is paired with the first element of Chiba, "Dynamic Programming Algorithm Optimization for 

string S2 (704) and the current score is set to the difference Spoken Word Recognition" in Readings in Speech 

between the first element of the two strings. In this case, the 65 Recognition, A. Waibel and K-F Lee, editors. Morgan 

current score is -10=5 (see 802 in FIG. 8). At any given Kaufrnann. San Mateo, Calif., USA. 1990.). In the present 

time, let the mth element of string S2 be paired with the nth invention, dynamic programming is used to find a matching 
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distance between two sequences of "strokes". The two FIG. U is a flow chart describing the module that re-sorts 
sequences of "strokes" represent the prepreprocessed hand- the first detailed match list of candidate characters. If FIG. 
written input and a stored template of a character. In a 11, the index i refers to the i th entry in the first detailed 
preferred method and embodiment of the present invention, match list of candidate characters produced by the dynamic 
a strobe is defined to be a straight line parametrized in some 5 programming based matching module (1102). The index of 
fashion. A preferred parametrization of the straight line the character stored as the i th entry in the first detailed 
strokes is by the four dimensional vector [mx, my, ten, ang] match list is denoted by li. The index j refers to the j th 
where mx is the x coordinate of the mid point of the stroke, template of character li and T(li) is the number of templates 
my is the y coordinate of the mid point of the stroke, len is stored for character li. The symbol L is used to denote the 
the length of the stroke, and ang is the angle made by the 10 size of the first detailed match list. The quantity ms is the 
straight line stroke with respect to some reference axis. minimum weighted dynamic programming matching dis- 
However, other definitions and parametrization s of strokes tance between the input and all the templates of one char- 
may be used. acter. The minimum weighted dynamic programming 
In order to use the dynamic programming technique, the matching distance is initialized to a large number at the start 
distance between two straight line strokes needs to be 15 of each new character in the first detailed match list (1104). 
defined. A preferred stroke distance to use between two Re-sorting the first detailed match liststarts by attempting to 
strokes parameterized as [mxl, myt lenl, angl] and [mx2, compute a minimum weighted dynamic programming 
my2, len2 ang21 is: matching distance between the input and the first template of 
' * the first character in the first detailed match list A weighted 
stroke distancc=*L^: dfcj(m*i-m*2>Hv_j2 abs{my\-myTy*v_j ^ dynamic programming matching distance s is computed 
abs(Uhi-Un2}+w_a cabs(<mgt-*n 8 2). between the input and the jth template of character li (see 
The quantities w_jt, w_y t w_J and w_a are the weights 1106) using the technique of dynamic programming and 
associated with different dimensions of vector describing a then weighting individual stroke errors in order to produce 
straight-line stroke. The function abs(x) is the absolute value the final matching distance. Traditional dynamic program- 
of x, and cabs(x) is the absolute value x assuming circular 25 ming methods also produce a "best path" which gives the 
symmetry of x. Note mat there is circular symmetry in the pairing of the input strokes and the strokes in the template, 
stroke angles, since 0 degrees is same as 360 degrees. In the The concept of best path is known to one of ordinary skill 
preferred implementation, me quantities mx, my, len, and in the art The normal dynamic programming matching 
ang (that describe a straight line stroke) are all quantized to distance is simply the sum of the inter-stroke distances along 
be between 0 and 255, so that a single byte (8 bits) can be 30 the best path. In order to get the weighted dynamic pro- 
used to store them. With the 8-bit quantization of the gramming matching distance, a weighted sum of the inter- 
parameters describing a stroke, the preferred weights to use stroke distance along the best path is used The weight used 
for computing the stroke distance is w_x=l, w_y=l, w_J= for each inter-stroke distance in the best path is weight 
1, and w_a=4. stored for each template stroke. The rationale for using a 
The rrunirnum dynamic programming matching distance, 35 weighted dynamic programming matching distance is that 
ms, is updated (1008) based on the newly computed some strokes in a handwritten character may be more 
dynamic programming matching distance s (1006). The consistent than other strokes when multiple instance of the 
steps 1006 and 1008 are repeated until all the T(fi) templates same character are considered Hence the more consistent 
for character fi are exhausted (1012). Once the miiiimum strokes need to be weighted more in order to get robust 
dynamic programming matching distance between the input 40 recognition of handwritten input The minimum weighted 
and the templates for character fi has been computed, the dynamic programming matching distance, ms, is updated 
current first detailed match list of characters that best match (1108) based on the newly computed weighted dynamic 
the input is updated (1014). In addition to the first detailed programming notching distance s (1106). The steps 1106 
match list of candidate characters that best match the input, and 1108 are repeated until all the T0i) templates for 
the minimum dynamic programming matching distances far 45 character li arc exhausted (1U0). Once the minimum 
matching the input with the templates for each character in weighted dynamic programming matching distance between 
the list are also stored. The first detailed match list is sorted the input and the templates for character li has been corn- 
using the nainimum dynamic programming matching dis- putcd (1112), the current sorted list of characters that best 
tances. Given a new character index (1010) and a corre- match the input is updated (1114). In addition to the sorted 
sponding rmnimum dynamic programming matching dis- 50 match list of candidate characters that best match the input, 
tance (1012), the current first detailed match list is updated the minimum weighted dynamic programming matching 
by inserting the new character index into the first detailed distances for matching the input with the templates for each 
match list at the location dictated by the new minimum character in the list are also stored, The final sorted match 
dynamic programming matching distance (1016). If the new list is sorted using the minimum weighted dynamic pro- 
minimum dynamic programming matching distance is 55 gramming matching distances. Given a new character index 
greater than the last entry in the current list of minimum and a corresponding rmnimum weighted dynamic program- 
dynamic programming matching distances, then the current ming matching distance, the current sorted match list is 
first detailed match list does not need to be updated (1018). updated by inserting the new character index (1116) into the 
After updating the current first detailed match list of candi- sorted match list at the location dictated by the new mini- 
date characters that best match the input, the next character 60 mum weighted dynamic programming matching distance. If 
in the fast match list, if one such exists, is considered by the new minimum weighted dynamic rtfogramming match- 
starting again at step 1004. After all the characters in the fast ing distance is greater than the last entry in the current list 
match list have been considered, the first detailed match list of minimum weighted dynamic programming matching 
of candidate characters that best match the input is for- distances, then the current sorted match list does not need to 
warded to the module that resorts the first detailed match list 65 be updated (1118). After updating the current sorted match 
of candidate characters in order to produce the final sorted list of candidate characters that best match the input, the next 
list of candidate characters that best match the input. character in the first detailed match list, if one such exists. 
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is considered by starting again at step 1104. After all the 
characters in the first detailed match list have been 
considered, the sorted match list of candidate characters 
becomes the final sorted list of candidate characters that best 
match the input 

Those skilled in the art will find many embodiments of the 
present invention to be useful. One obvious advantage is 
ease of data, or text, input over traditional key-board entry 
methods, including the obvious advantage of the ease of 
entry of scanned or "off-line" handwritten input into printed 
data, or text. Another obvious advantage is recognition of 
handwritten input where the input represents a single 
character, data point, or other unitary identifying graphically 
representation, that is a subset of a large complex set of 
characters, data points, or other graphical representation. 

It will be apparent to those skilled in the art that the 
disclosed invention may be modified in numerous ways and 
may assume many embodiments other than the preferred 
forms particularly set out and described above. Accordingly, 
it is intended by the appended claims to cover all modifi- 
cations of the present invention that fall within the true spirit 
and scope of the present invention and its equivalents. 

What is claimed is: 

1. A method of recognizing a handwritten symbol com- 
prising the steps of: 

(A) receiving a handwritten input as a sequence of (x, y, 
pen)points where x and y are coordinates in a two 
dimensional coordinate system and pen is a binary 
value indicating an associated pen-down state; 

(B) converting the sequence of points into a sequence of 
strokes, each stroke representing a basic unit used to 
construct any handwritten symbol in a vocabulary; 

(C) deterniining a shortlist of candidate symbols in the 
vocabulary that are likely matches for the input by 
finding a fast matching distance between the input 
sequence of strokes and the sequence of strokes for 
each symbol in the vocabulary, said sequence of strokes 
for each symbol in the vocabulary being derived from 
statistical analysis on samples of the handwritten sym- 
bols; and 

(D) determining a final sorted list of candidate symbols in 
the vocabulary that are likely matches for the input by 
finding a minimum dynamic programming matching 
distance between the input sequence of strokes and the 
sequence of strokes for each symbol in the shortlist of 
candidate symbols in the vocabulary that are likely 
matches for the handwritten input 

2. The method of claim 1, further comprising the steps of 
computing a dynamic programming match path and calcu- 
lating a weighted sum of inter-stroke distances along the 
dynamic programming match path to find the final sorted list 
of candidate symbols in the vocabulary that are likely 
matches for the handwritten input. 

3. The method of claim 1, further comprising the step of 
providing a selectable list of one or more candidate charac- 
ters having the highest likelihood of corresponding to the 
handwritten input. 

4. The method of claim 3, wherein the number of candi- 
date characters of the selectable list is variable. 

5. The method of claim 3. wherein the number of candi- 
date characters of the selectable list is determined by a user. 

6. The method of claim 3* wherein the number of candi- 
date characters of the selectable list is ten. 

7. The method of claim 3, wherein the handwritten input 
is displayed and the at least one candidate character is 
displayed simultaneously. 

8. The method of claim 3, wherein the handwritten input 65 
is displayed and said selectable list of candidate characters 

is displayed simultaneously. 
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9. An apparatus for recognizing a handwritten symbol 
comprising; 

(A) a digitizing tablet for receiving handwritten input, 
said handwritten input having a sequence of (x, y. pen) 
points where x and y coordinates are in a two dimen- 
sional coordinate system and pen is a binary value 
indicating an associated pen-up/pen-down state; 

(B) a memory for storing an input sequence of strokes, 
said sequence of strokes representing the sequence of 
points, each stroke representing a basic unit used to 
construct at least one handwritten symbol of a character 
set; 

(C) a processor for generating a shortlist of candidate 
symbols the character set that most likely represent the 
input by finding a fast matching distance between the 
input sequence of strokes and a sequence of strokes 
representing each symbol in a vocabulary, said 
sequence of strokes representing each symbol in the 
vocabulary being derived from statistical analysis of a 
plurality of samples of handwritten symbols; and 

(D) a processor for generating a final sorted list of 
candidate symbols of the vocabulary which most likely 
represents the handwritten input by finding a detailed 
matching distance between the input sequence of 
strokes and the sequence of strokes for each symbol in 
the said shortlist of candidate symbols in the vocabu- 
lary that most likely represents the handwritten input. 

10. The apparatus of claim 9, further comprising a display 
having a selectable list of one or more candidate characters 
having the highest likelihood of corresponding to the hand- 
written input. 

11. The apparatus of claim 10, wherein the number of 
candidate characters of the selectable list is variable. 

12. The apparatus of claim 10, wherein the number of 
candidate characters of the selectable list is determined by a 
user. 

13. Hie apparatus of claim 10, wherein the number of 
candidate characters of the selectable list is ten. 

14. The apparatus of claim 10, wherein the handwritten 
input is displayed and the selectable list is displayed simul- 
taneously. 

15. The apparatus of claim 10, wherein the handwritten 
input is displayed and said selectable list of candidate 
characters is displayed simultaneously. 

16. A method of recognizing handwritten input compris- 
ing the steps of: 

receiving handwritten input; 

preprocessing the handwritten input to represent a string 
of strokes; 

analyzing the handwritten input by matching of the string 
of strokes and a sequence of strokes for a plurality of 
symbols in a vocabulary to provide a shortlist of 
candidate characters which most likely correspond to 
the handwritten input; 

accessing a plurality of reference templates corresponding 
to only the candidate characters of the shortlist and 
comparing the handwritten input with at least one of 
said reference templates to identify at least one candi- 
date character most likely to represent said handwritten 
input 

17. A method of recognizing handwritten input compris- 
ing the steps of: 

analyzing handwritten input to provide a plurality of 
candidate characters that most likely represent the 
handwritten input by finding a fast matching distance 
between an input sequence of strokes and a sequence of 
strokes for each symbol in a vocabulary of characters; 
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accessing a plurality of reference templates corresponding 
to the candidate characters and comparing the hand- 
written input with at least one of said reference tem- 
plates to identify at least one of the candidate characters 
that most likely represents said handwritten input by 
finding a detailed matching distance between the input 
sequence of strokes and a sequence of strokes for each 
one of said reference templates; 

providing a selectable list of a plurality of candidate 
characters, said list having candidate characters having 
the highest likelihood of corresponding to the hand- 
written input. 

18. The method of claim 17, wherein the handwritten 
input has been preprocessed to represent a string of strokes. 

19. The method of claim 17, wherein the number of 15 
candidate characters of the selectable list is variable. 

20. The method of claim 17, wherein the number of 
candidate characters of the selectable list is determined by a 
user. 

21. The method of claim 17, wherein the number of 20 
candidate characters of the selectable list is ten. 

22. The method of claim 17, wherein the handwritten 
input is displayed juxtaposed at least one candidate charac- 
ter. 

23. The method of claim 17. wherein the handwritten 
input is displayed in close juxtaposed said selectable list of 
candidate characters. 

24. A method of recognizing handwritten input compris- 
ing the steps of: 

analyzing a preprocessed sequence of strokes represent- 
ing handwritten input to provide a short list of a 
plurality of candidate characters having a likelihood of 
corresponding to said preprocessed handwritten input; 

analyzing said handwritten input and said short list of said 
plurality of candidate characters by comparison of each 
of said sequence of strokes of the handwritten input 
with at least one reference template having a sequence 
of strokes corresponding to each of said plurality of 
candidate characters in said short list; 

providing a selectable list of at least one candidate 
character, said list of candidate characters having the 
highest likelihood of corresponding to the handwritten 
input 

25. A method of recognizing handwritten input compris- 
ing the steps of: 

receiving preprocessed handwritten input as a sequence of 
strokes; 

providing a shortlist of at least one of a plurality of 
candidate characters, by comparison of said handwrit- 
ten input with a plurality of reference templates, each 
of said templates having a sequence of strokes repre- 
senting a corresponding candidate character; 

providing, by selection from said shortlist, a selectable list 
of at least one of a plurality of candidate characters, $5 
said list of candidate characters having the highest 
likelihood of corresponding to the handwritten input. 

26. An apparatus for recognizing handwritten input com- 
prising; 

a memory for storing preprocessed handwritten input as a 
sequence of strokes; 

a processor generating a shortlist of candidate characters, 
said shortlist generated by fast matching of said pre- 
processed handwritten input with a plurality of refer- 
ence templates, each of said templates having a 
sequence of strokes representing a corresponding can- 
didate character; 



a processor generating a selectable list of at least one of 
a plurality of candidate characters, by detailed match- 
ing of said preprocessed handwritten input with refer- 
ence templates for said shortlist of candidate 
characters, whereby said selectable list of candidate 
characters has the highest likelihood of corresponding 
to the handwritten input 

27. A method of recognizing a handwritten symbol com- 
prising the steps of: 

receiving a handwritten input as a sequence of (x, y, 
pen)points where x and y are coordinates in a two 
dimensional coordinate system and pen is a binary 
value indicating an associated pen-down state; 
converting the sequence of points into a sequence of 
strokes, each stroke representing a basic unit used to 
construct any handwritten symbol in a vocabulary; 
parametizing each stroke using a four dimensional vector 
having an x coordinate of the midpoint of the stroke, a 
y coordinate of the midpoint of the stroke, a length 
representing the length of the stroke, and an angle 
representing the angle of the stroke with respect to a 
reference axis, 
determining a shortlist of candidate symbols in the 
vocabulary that are likely matches for the input by 
finding a fast matching distance between the input 
sequence of strokes and the sequence of strokes for 
each symbol in the vocabulary, said sequence of strokes 
for each symbol in the vocabulary being derived from 
statistical analysis on samples of the handwritten sym- 
bols; and 

determining a final sorted list of candidate symbols in the 
vocabulary that are likely matches for the input by 
finding a detailed matching distance between the input 
sequence of strokes and the sequence of strokes for 
each symbol in the shortlist of candidate symbols in the 
vocabulary mat are likely matches for the handwritten 
input 

28. A method of recognizing a handwritten symbol com- 
40 prising the steps of: 

providing a vocabulary of handwritten symbols, each 
symbol represented as a sequence of strokes, each 
stroke representing a basic unit used to construct any 
handwritten symbol in the vocabulary, said sequence of 
strokes for each symbol in the vocabulary being 
derived from statistical analysis on samples of the 
handwritten symbols; 
receiving a handwritten input as a sequence of (x, y, pen) 
points where x and y are coordinates in a two dimen- 
sional coordinate system and pen is a binary value 
indicating an associated pen-down state; 
converting the sequence of points into a sequence of 
strokes; 

parametizing each stroke using a vector having an x 
coordinate of the midpoint of the stroke, a y coordinate 
of the midpoint of the stroke, a length representing the 
length of the stroke, and an angle representing the angle 
of the stroke with respect to a reference axis; 
associating different weighting values to different strokes, 
said weighting values being derived from statistical 
analysis on a plurality of handwritten symbols; 
detenmning a list of candidate symbols in the vocabulary 
that are likely matches for the input by finding a 
matching distance between the input sequence of 
strokes and the sequence of strokes for each symbol in 
the vocabulary, taking account of the weighting values. 
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29. The method of claim 28. further comprising calculat- 
ing a stroke distance between two or more straight line 

. strokes such that the stroke distance is equivalent to 

»Oc abs(mxl-mx2y+w_2 abs{my\-myiy+w_i abs(Unl-Un2}+ 

where taxi and mx2 are the x coordinates of the mid-point 
of two strokes, myl and my2 are the y coordinates of the 
midpoint of the two strokes, lenl and len2 are the lengths of 
the two strokes, angl and ang2 are the angles of the two 
strokes with respect to a common reference, and w_x, w_y, 
w_J and w_a are weighting values. 

30. A method, comprising the steps of: 

receiving handwritten input as data representing a 
sequence of strokes; 

determining a shortlist of a plurality of candidate symbols 
from stored templates that are likely matches for the 
handwritten input by comparing one or more stroke 
parameters between the sequence of strokes represent- 
ing the handwritten input and the sequence of strokes 
for a plurality of symbols from the templates; and 

determining one or more recognized symbols that are 
likely matches for the handwritten input by comparing 
two or more stroke parameters between the sequence of 
strokes representing the handwritten input and the 
sequence of strokes for each symbol in the shortlist that 
are likely matches for the handwritten input. ' 

31. A method, comprising the steps of: 

processing handwritten input as a sequence of handwrit- 
ten strokes to provide data representing a sequence of 
straight strokes; 

detennining a shortlist of a plurality of candidate symbols 
from stored templates that are likely matches for the 
handwritten input by comparing one or more stroke 
parameters between the sequence of straight strokes 
representing the handwritten input and the sequence of 
strokes for a plurality of symbols from the templates; 
and 

determining one or more recognized symbols that are 
likely matches for the handwritten input by comparing 
two or more stroke parameters between the sequence of 
straight strokes representing the handwritten input and 
the sequence of strokes for each symbol in the shortlist 
that are likely matches for the handwritten input. 

32. A method, comprising the steps of: 

receiving handwritten input as data representing a 
sequence of strokes; 

detennining an angle parameter for each of the sequence 
of strokes, the angle parameter representing an angle of 
the stroke from a reference axis; 

determining a shortlist of a plurality of candidate symbols 
from stored templates that are likely matches for the 
handwritten input by comparing the angle parameter 
between the sequence of strokes representing the hand- 
written input and the sequence of strokes for a plurality 
of symbols from the templates; and 

determining one or more recognized symbols that are 
likely matches for the handwritten input by comparing 
other stroke parameters between the sequence of 
strokes representing the handwritten input and the 
sequence of strokes for each symbol in the shortlist that 
are likely matches for the handwritten input. 

33. A method, comprising the steps of: 

receiving handwritten input as data representing a 
sequence of strokes; 
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determining stroke parameters for each of the sequence of 
strokes, the stroke parameters selected from the group 
of parameters consisting of: 

an angle parameter representing an angle of the stroke 

from a reference axis; 
a x coordinate midpoint parameter representing the x 

coordinate of the midpoint of the stroke; 
a y coordinate midpoint parameter representing the y 
coordinate of the midpoint of the stroke; 
) a length parameter representing stroke length; 

determining a plurality of candidate symbols from stored 
templates that are likely matches for the handwritten 
input by comparing one stroke parameter between the 
sequence of strokes representing the handwritten input 
and the sequence of strokes for a plurality of symbols 
f from the templates; and 

determining one or more recognized symbols that are 
likely matches for the handwritten Input by comparing 
two or more stroke parameters between the sequence of 
strokes representing the handwritten input and the 
} sequence of strokes for each symbol in the plurality of 
candidate symbols that are likely matches for the 
handwritten input 

34. An apparatus, comprising 

a digitizing tablet for receiving handwritten input as a 

s sequence of strokes; 

a memory having data and instructions stored therein and 
having a plurality of templates representing characters 
or symbols some of which may correspond to the 

} handwritten input; 

a processor for processing the data or instructions in the 
memory to provide a shortlist of a plurality of candidate 
symbols by comparing at least one stroke parameter 
between the sequence of strokes representing the hand- 

; written input and a sequence of strokes for one or more 

* characters or symbols in the memory, and for providing 
a selectable plurality of recognized symbols by com- 
paring two or more stroke parameters between the 
sequence of strokes representing the handwritten input 

j and the sequence of strokes for each of the plurality of 
candidate symbols of the shortlist 

35. A method of recognizing a handwritten symbol com- 
prising the steps of: 

(A) receiving a handwritten input as a sequence of (x, y) 
s points where x and y are coordinates in a two dimen- 
sional coordinate system; 

(B) converting the sequence of points into a sequence of 
strokes, each stroke representing a basic unit used to 
construct any handwritten symbol in a vocabulary; 

50 (Q determining a shortlist of candidate symbols in the 
vocabulary that are likely matches for the input by 
finding a fast matching distance between the input 
sequence of strokes and the sequence of strokes for 
each symbol in the vocabulary, said sequence of strokes 

55 for each symbol in the vocabulary being derived from 
statistical analysis on samples of the handwritten 
symbols, said step of determining a shortlist including 
comparing a difference between a number of strokes in 
an input sequence of strokes and a number of strokes in 

60 a template with a threshold; and 

(D) detennining a final sorted list of candidate symbols in 
the vocabulary that are likely matches for the input by 
finding a minimum dynamic programming matching 
distance between the input sequence of strokes and the 

65 sequence of strokes for each symbol in the shortlist of 
candidate symbols in the vocabulary that are likely 
matches for the handwritten input. 
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36. A method of recognizing a handwritten symbol com- 
prising the steps of: 

(A) receiving a handwritten input as a sequence of (x, y) 
points where x and y are coordinates in a two dimen- 
sional coordinate system; 

(B) converting the sequence of points into a sequence of 
strokes, each stroke representing a basic unit used to 
construct any handwritten symbol in a vocabulary; 

(Q determining a shortlist of candidate symbols in the 
vocabulary that are likely matches for the input by 
finding a fast matching distance between first and 
second sets of strokes, where one of the first and second 
sets of strokes is the input sequence of strokes and the 
other is the sequence of strokes for each symbol in the 
vocabulary, said sequence of strokes for each symbol in 
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the vocabulary being derived from statistical analysis 
on samples of the handwritten symbols, said step of 
finding a fast matching distance including calculating 
distances between a stroke in one of the sets of strokes 
and neighboring strokes in the other of the sets of 
strokes; and 

(D) determining a final sorted list of candidate symbols in 
the vocabulary that are likely matches for the input by 
finding a minimum dynamic prograinming matching 
distance between the input sequence of strokes and the 
sequence of strokes for each symbol in the shortlist of 
candidate symbols in the vocabulary that are likely 
matches for the handwritten input. 
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